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1. INTRODUCTION 

Our growing dependence on networks has inspired a burst of re- 
search! activity in the field of network science. One focus of this re- 
search is to derive network models capable of explaining common 
structural characteristics of large real networks, such as the Internet, 
social networks, and many other complex networks yj. A particular 
goal is to understand how these characteristics affect the various pro- 
cesses that run on top of these networks, such as routing, information 
sharing, data distribution, searching, and epidemics |1|. Understand- 
ing the mechanisms that shape the structure and drive the evolution of 
real networks can also have important applications in designing more 
efficient recommender and collaborative filtering systems |2|, and for 
predicting missing and future links — an important problem in many 
disciplines |3j. 

Krioukov et al. pl have shown that there are intrinsic connections 
between complex network topologies and hyperbolic geometry, since 
the former exhibit hierarchical, tree-like organization, while the latter 
is the geometry of trees |5J. Following Papadopoulos et al. |6|, 
have recently shown that trade-offs between popularity and similarity 
shape the structure and dynamics of growing complex networks, and 
that these trade-offs in network dynamics give rise to hyperbolic ge- 
ometry. The work in 1 6| introduces a simple model for constructing 
synthetic growing networks in the hyperbolic plane, which simultane- 
ously exhibit many common structural and dynamical characteristics 
of real networks. We call the model of the Popularity x Similarity 
Optimization (PSO) model. 

Given the ability of the PSO model to construct synthetic growing 
networks that resemble real networks across a wide range of structural 
and dynamical characteristics, an interesting question is whether one 
can reverse the synthesis, and given a real network, map (embed) the 
network into the hyperbolic plane, in a way congruent with the PSO 
model. Our main contribution in this work is an affirmative answer to 
this question and a systematic framework that accomplishes this task, 
by replaying the network's geometric growth. The proposed frame- 
work, called HyperMap, is quite simple and it is supported by theoret- 
ical analysis. We apply this framework to the Autonomous Systems 
(AS) topology of the real Internet and show that it produces meaning- 
ful results, identifying communities of ASs that belong to the same 
geographic region. Further, we show that the proposed framework has 
a remarkable predictive power, demonstrated by its ability to predict 
missing links with high precision. While we consider here the AS In- 
ternet topology and the prediction of missing links |3 |, there are also 
other interesting areas where the proposed framework could find ap- 
plications, e.g., in community detection | IJ, and in the prediction of 
future links 

2. PRELIMINARIES: THE PSO MODEL 

The PSO model ^6J constructs a growing network up to t > nodes 



as follows: (1) initially the network is empty; (2) at time 1 < i < t, 
new node i appears having coordinates {ri,9i), where Vi — 21ni, 
while 9i is uniformly distributed on [0, 2n], and every existing node 
j, j < i, moves increasing its radial coordinate according to rj{i) = 
/3rj + (1 — ,<3)ri with parameter /3 G [0, 1] ; and (3) node i looks at every 
existing node j, j < i, and connects to it with probability p{xji) = 
-, where Xji is the hyperbolic distance between nodes j 
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parameter m is the average number of existing nodes that a new node 
connects to, defining the average node degree in the network k = 2m. 
Parameter /3 £ [0, 1] is a function of the exponent 7 > 2 of the target 
power law degree distribution P{k) ~ k ', /3 = Finally, model 
parameter T G [0, 1) is called temperature, and controls the average 
clustering in the network: clustering is maximized at T = 0, and it 
decreases to zero as T — > 1. To construct a network up to t nodes we 
need to specify m, /?, and T. 

It has been proven in |6j that the expected degree ki (t) of a node 

bom at time i by time t > i is ki[t) ~ (|) ^-i . This equation says 
that the earlier a node appears the higher is its expected degree. We 
use this observation in HyperMap in the next section. In Figure [2|a) 
we use real data ^ to validate that this equation indeed describes the 
trend in the evolution of the average degree of an AS in the Internet 
as a function of the time the AS appeared. To draw Figure |2j a) we 
took the historical data of the twelve-year (1998-2010) evolution of 
the AS Internet from |7 1, and for each AS we found the time i (number 
of nodes present in the network) when the AS first appeared in the 
data. Then, for all ASs that appeared at time i, and which were present 
at the end of the measurement period (where t — 33796 nodes) we 
calculated their average degree ki{t). For the theoretical formula we 
used 7 = 2.1, i.e., the 7 of the AS Internet 1 1 1. 

The PSO model reproduces not only the degree distribution and 
clustering of real networks, but also many other important proper- 
ties 1 6]. Given the ability of the PSO model to construct growing syn- 
thetic networks that resemble real networks, we show that it is possible 
to reverse the synthesis, and given a real network, to map (embed) the 
network into the hyperbolic plane, in a way congruent with the PSO 
model. 



3. THE MAPPING METHOD (HYPERMAP) 

Given a scale-free network with t nodes, average node degree k, 
power law exponent 7 > 2, temperature T £ [0, 1), and adjacency 
matrix {uij} — aij = a^i = 1 if there is a link between nodes i and 
j, and aij = aji = otherwise — HyperMap computes radial and 
angular coordinates ri(t), 9i, for all nodes i < t as shown in Figure[T| 



1: Sort node degrees in decreasing order fci (t) > k2{t) > ... > kt{t) with 

ties broken arbitrarily. 
2: Call node i,i = I, 2, t, the node with degree ki{t). 
3: Node i = 1 is born, assign to it initial radial coordinate r\ = and random 

angular coordinate 9i £ [0, 27r]. 
4: for i = 2 to i do 

5: Node i is born, assign to it initial radial coordinate = 2 In i. 

6: Increase the radial coordinate of every existing node j < i according to 

7: Assign to node i angular coordinate 9i maximizing Li given by Eq. {TJ. 
8 : end for 



Figure 1: The HyperMap Embedding Algorithm. 

Specifically, HyperMap first estimates the order by which the nodes 
of the network are born. Since, according to the PSO model, the earlier 
a node appears the higher its expected degree, HyperMap first com- 
putes the degree of every node in the network and then sorts the node 
degrees in the decreasing order fci (t) > ^2 (i) > ■ • • > ki {t) > 
... > kt{t), with ties broken arbitrarily, thus creating a sequence 
of node birth times i = 1,2, ... ,t, corresponding to nodes with de- 
grees ki{t), k2{t), . . . ,ki{t), . . . , kt{t). We call the node bom at time 
i node i. Having a sequence of node birth times, HyperMap replays the 
geometric growth of the network in accordance with the PSO model as 
follows. When a node is bom at time 1 < i < t, it is assigned an ini- 
tial radial coordinate = 2 In i, and every existing node j < i moves 
increasing its radial coordinate according to rj{i) — (3rj + (1 — P)ri, 
with /3 — . To compute the angular coordinate 6i of a new node i, 
we first deSne likelihood Li : 



l<j<i 



(1) 



where Xji'is the hyperbolic distance between node i and existing node 
j, p(xji) is the connection probability defined in the previous section, 
and Qji is the network adjacency matrix. Likelihood Li is the proba- 
bility that the given set of connections between new node i and existing 
nodes j < i take place in the PSO model. This likelihood is a func- 
tion of 9i, since Xji depends on 6i, p{xji) depends on Xji, and Li de- 
pends on p{xji). The best value for 9i is then the value that maximizes 
Li. The maximization can be performed numerically, by sampling the 
likelihood Li at different values of 6 in [0, 27r] separated by intervals 
A9 — 0(i), and then setting 6i to the value of ^ that yields the largest 
value of Li. Since to compute Li for a given 6 we need to compute the 
connection probability between node i and all existing nodes j < i, 
we need a total of 0{i^) steps to perform the maximization. We note 
that since Li is sampled at 6 values separated by 0{\) intervals, the 
maximization is approximate and becomes more precise as i increases. 

Since the PSO model can construct growing synthetic networks that 
resemble real networks, we expect HyperMap to be able to accurately 
map a given real network into the hyperbolic plane, in a way congruent 
with the PSO model. In the next section we use the AS Internet topol- 
ogy to show that this is indeed the case. We note that HyperMap uses 
the current network adjacency matrix in Equation ([TJ to find the best 
estimate for the angular position of each node, and does not require 
any knowledge about whether nodes/links were departing while the 
network was evolving, or whether connections between some nodes 
might have been internal (i.e., took place some time after the nodes 
appeared). More details related to these remarks and to the method 
will appear in a longer version of this paper. 

4. VALIDATION 



After mapping a network with t nodes, we have the radial and an- 
gular coordinates ri(t),6i, for all nodes i < t, and therefore, we can 
compute the hyperbolic distance between every pair of nodes. To eval- 
uate how well HyperMap maps the network we use two metrics: (i) 
the connection probability p{x{t)), which is the probability that there 
is a link between a pair of nodes given their hyperbolic distance x{t) 
at time t; and (ii) the distance distribution d{x,t), which is the per- 
centage of node pairs whose hyperbolic distance at time t is x. After 
mapping a network we compute these two metrics and juxtapose them 
against our theoretical predictions for networks growing according to 
the PSO model, given below: 
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AS Intemet. We use the AS Intemet topology (8] of December 2009, 
available at [9|. The connections in this topology are not physical but 
logical, representing AS relationships |9|. We consider the topology 
consisting of all nodes (ASs) with degree greater than 2. There are 
t = 8220 such nodes, and the topology has a power law degree dis- 
tribution with exponent 7 — 2.1, average node degree k = 9.45 and 
average clustering c — 0.60. We map the topology using HyperMap 
with different values of the temperature T, and show the results in 
Figures [2|b),(c). From the figures we observe that HyperMap is re- 
markably accurate. Further, different T values give approximately the 
same results. In particular, all the T values give a connection proba- 
bility that is best matched theoretically (using Eq. ([2]l) with T = 0.8. 
These results are interesting as they imply that in practice HyperMap is 
not very sensitive to the exact value of the input parameter T. As men- 
tioned in Section|2] c is controlled by T. But given c and the network 
topology there is no formula that can be used to infer T. However, our 
results above suggest that we can find T for a real network experimen- 
tally, by embedding the network using different values for T, and then 
use Equation ^ to find the T value that best matches the empirical 
connection probability. 

In Figure [3] we demonstrate that HyperMap produces meaningful 
results. The figure shows the angular distribution of ASs that belong 
to the same country, for 13 different countries. The AS-to-country 
mapping is taken from the CAIDA AS ranking project | lOJ. We ob- 
serve that even though HyperMap is completely geography-agnostic, 
it discovers meaningful groups or communities of ASs belonging to 
the same country. The reason for this is that ASs belonging to the 
same country are usually connected more densely than the rest of the 
world, and HyperMap correctly places all such ASs in narrow regions, 
close to each other. However, as expected, due to significant geo- 
graphic spread in ASs belonging to the US, these ASs are widespread 
in [0° , 360°] . We note that other reasons besides geographic proximity 
may affect the connectivity between ASs, such as economical and/or 
political reasons. HyperMap does not favor any specific reason but re- 




Figure 3: HyperMap yields meaningful results. 



lies only on the connectivity between ASs in order to place the ASs at 
the right angular (and consequently hyperbolic) distances. 

5. APPLICATION TO THE PREDICTION OF 
MISSING LINKS 

Topology measurements of many real networks, not only of the In- 
ternet 1 11 1, may miss some links. The prediction of missing links is 
a fundamental problem that attempts to estimate the likelihood of the 
existence of a missing link between two nodes in a network, based on 
the observed links and/or the attributes of nodes The standard way 
to evaluate a link prediction technique is to randomly remove a per- 
centage of links from a given network topology, and then work with 
this incomplete data using the technique to see how well these miss- 
ing links can be predicted. The standard metrics used to quantify the 
accuracy of a link prediction technique is the Area Under the Receiver 
Operating Characteristic Curve (AUC) and Precision |[3J. A link pre- 
diction algorithm gives to each non-observed link (i, j) a score Sij to 
quantify its existence likelihood. The prediction algorithm then orders 
all the non-observed links according to their scores, from the best score 
to the worst score. The AUC is the probability that a randomly chosen 
missing link is given a better score than a randomly chosen nonexistent 
link. If we consider only the top-L links from the ordered list, among 
which Lr links turn out to be right (i.e., indeed missing), then the Pre- 
cision is the ratio Below, to compute Precision we use L = 100 
(as used in [3J). 

Performance of HyperMap. To check how effective HyperMap is in 
predicting missing links, we first remove 30% of links from the AS 
Internet topology and then embed the resulting topology using the 
method with T = 0.8. After the embedding, the score Sij between 
a disconnected pair of nodes i.e., the score of each non-observed 
link is the hyperbolic distance Xij between the nodes i and j. 

The smaller this score, i.e., the smaller the hyperbolic distance be- 
tween the two nodes, the more likely it is that a link between these 
two nodes is missing, since the connection probability p{xij) is a de- 
creasing function of Xij . Both AUC and Precision in HyperMap are 
remarkably high, AUC = 0.95, Precision = 0.71, indicating that the 
method has a strong predictive power. 

6. FUTURE WORK 



There are several directions for future work. One is to further ex- 
plore and understand HyperMap's ability to predict missing links. An- 
other, is to find efficient ways to expedite the running time of Hyper- 
Map without compromising the embedding accuracy. Finally, it would 
be interesting to explore the efficiency of HyperMap for other tasks, 
such as community detection (see Figure [3j, or the challenging prob- 
lem of predicting /i(ft<re links in different evolving networks. 
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